142

Table 11.1  Molecular biology foci and important databases and software

Molecular biology

focus

Important databases/software

How do cells read their

genome?

NCBI, EMBL, EBI, BLAST, Rfam, RNAAnalyzer, RNAfold, SMART,

PDB, SCOP, CATH, ProDom, Pfam

How do cells control

gene expression?

GEO, GENEVESTIGATOR, cBioPortal, TCGA, TESS, ALGGEN

PROMO, Genomatix, MEME Suite, iRegulon, miRanda, TargetScan,

STRING, KEGG, Roche Biochemical Pathways, STITCH, DrumPID

How cells localize,

transport and secrete

proteins

KEGG, PyMOL, RasMol, Ramachandran Plot, ELM Server, TMHMM

How cells build a solid

skeleton and move

actively

ExPASy, PROSITE, ProDom, PlateletWeb, MUSCLE, EMA, Metatool,

YANAsquare

How do cells

communicate?

STRING, iHOP, PRODORIC, SQUAD, Jimena, SWISS-MODEL,

I-TASSER, LOMETS, QUARK, Rosetta

on a given function or sequence is provided by domain databases; in addition to SMART

and EMBL, ProDom and Pfam are particularly important. The three-dimensional structure

of many proteins is stored in the protein structure database PDB, details of the architecture

in the structure databases SCOP and CATH.

How Do Cells Control Gene Expression?

Interestingly, at any given moment, only a fraction of the genome information is translated

into RNA molecules. The question is: How do I quickly find out bioinformatically which

RNA is synthesized in which cell type? For this purpose, the GEO (Gene Expression

Omnibus) database is good, which holds numerous data from gene expression experiments

for different organisms, tissues and diseases in detail. A similar database is

GENEVESTIGATOR. The cBioPortal and The Cancer Genome Atlas (TCGA) databases

focus on cancer. In particular, because usually all transcripts of a cell are measured, these

experiments can also be used to infer from previous data how one’s desired gene is regu­

lated. For this purpose, GEO, GENEVESTIGATOR, cBioPortal and TCGA also hold sta­

tistical analysis. Next, there is promoter analysis software. This allows me to determine

which regulatory sequences regulate the turning on and off of a gene. There are simple

programs for this, such as TESS or ALGGEN PROMO, which simply reveal numerous

binding sites for transcription factors, and usually far too many possibilities. In addition,

there are better, but often commercial programs such as Genomatix, which, among other

things, compare which of the many binding sites within a gene family are conserved and

thus presumably actually regulate transcription, so-called modules (e.g. consisting of three

specific transcription factors), for example to specifically transcribe liver genes, such as

Liver-specific-transcription-factor-1 modules. Ab initio approaches such as MEME Suite

and iRegulon offer another possibility to find unknown TF motifs and regulatory TF factors.

For regulation in the cell, it is also important that proteins control each other. For this,

the protein interaction database STRING (EMBL) is very good and broad (and there are

11  Design Principles of a Cell